[v1] Refactor KVCacheConfig #14079

heheda12345 · 2025-03-01T13:24:58Z

This PR makes the following changes to KVCacheConfig

change the meaning of KVCacheSpec from the spec of the model to the spec of one layer
KVCacheConfig class:
2. save the kv_cache_spec for each kv cache group instead of each layer
The logic to make the same KVCacheConfig for all workers (EngineCore._initialize_kv_caches). Change it into 3 steps:
1. get the available memory of each worker independently
2. get the kv_cache_config for each worker independently
3. adjust the kv_cache_config for all workers to make them the same, including assigning the same num_blocks and the same KVCacheGroupSpec (make_kv_cache_configs_consistent)

Signed-off-by: Chen Zhang <[email protected]>

github-actions · 2025-03-01T13:25:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

zhuohan123

Left some minor comments. In general LGTM!

vllm/v1/executor/abstract.py

vllm/v1/core/kv_cache_utils.py

zhuohan123 · 2025-03-09T04:14:47Z

vllm/v1/kv_cache_interface.py

@@ -87,6 +84,18 @@ class KVCacheTensor:
    size: int  # The size of KV cache Tensor in bytes


+@dataclass
+class VirtualLayer:


Why do we call this VirtualLayer? I believe woosuk and you had some discussions among this. The issue of VirtualLayer is that you can't tell that it's related to KV cache and KV cache grouping. To me I feel like maybe some names like "GroupedLayerKV" will be more straightforward.

Discussing in #hybrid-mem channel of slack.

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-03-09T04:30:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chen Zhang <[email protected]>

heheda12345

@zhuohan123 Updated the PR based on your review. More discussion on the name of VirtualLayer is needed.

vllm/v1/executor/abstract.py

heheda12345 · 2025-03-09T06:06:31Z

vllm/v1/kv_cache_interface.py

@@ -87,6 +84,18 @@ class KVCacheTensor:
    size: int  # The size of KV cache Tensor in bytes


+@dataclass
+class VirtualLayer:


Discussing in #hybrid-mem channel of slack.

vllm/v1/core/kv_cache_utils.py

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Chen Zhang <[email protected]>

vllm/v1/engine/core.py

comaniac · 2025-03-12T22:52:04Z

vllm/v1/core/kv_cache_utils.py

+    raise NotImplementedError
+
+
+def make_kv_cache_configs_consistent(kv_cache_configs: list[KVCacheConfig]):


Looks more like unify_kv_cache_configs?

comaniac · 2025-03-12T22:52:32Z

vllm/v1/core/kv_cache_utils.py

+    # Change the num_blocks of each rank to the smallest among all ranks. We
+    # do not need to shrink the tensor size because it is valid to only use the
+    # first `num_blocks` blocks of the tensor.
+    num_blocks = min(kv_cache_config.num_blocks


nit

Suggested change

num_blocks = min(kv_cache_config.num_blocks

min_num_blocks = min(kv_cache_config.num_blocks

comaniac · 2025-03-12T22:53:08Z

vllm/v1/worker/gpu_model_runner.py

+                                                        dtype=dtype,
+                                                        device=self.device)
+                else:
+                    raise NotImplementedError


mergify · 2025-03-12T22:53:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 · 2025-03-18T15:41:44Z

@zhuohan123 @comaniac Updated the PR based on your comments. Can you review it again?

comaniac

LGTM. Approve to unblock. Left to @WoosukKwon and @zhuohan123

DarkLight1337 · 2025-03-21T02:27:22Z

Please fix the merge conflict

Signed-off-by: Chen Zhang <[email protected]>

mergify · 2025-03-21T02:31:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

heheda12345 added 2 commits March 1, 2025 04:54

kv cache config refactor

c209a59

Signed-off-by: Chen Zhang <[email protected]>

update comments

2b30e35

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 1, 2025 13:24

mergify bot added the v1 label Mar 1, 2025

comaniac assigned comaniac and WoosukKwon Mar 1, 2025

heheda12345 mentioned this pull request Mar 2, 2025

[V1] Implement sliding window attention in kv_cache_manager #14097

Merged

zhuohan123 approved these changes Mar 9, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 9, 2025

heheda12345 added 2 commits March 8, 2025 22:25

address review comments

93adab8

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm into virtual_layer

9082be5

Signed-off-by: Chen Zhang <[email protected]>

mergify bot removed the needs-rebase label Mar 9, 2025

heheda12345 commented Mar 9, 2025

View reviewed changes

ManagerKVLayer

4d05626

Signed-off-by: Chen Zhang <[email protected]>

comaniac reviewed Mar 12, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 12, 2025

heheda12345 added 2 commits March 18, 2025 06:12

update names

530d4bf

Signed-off-by: Chen Zhang <[email protected]>

Merge remote-tracking branch 'origin/main' into virtual_layer

b9fd999

Signed-off-by: Chen Zhang <[email protected]>

mergify bot removed the needs-rebase label Mar 18, 2025

heheda12345 added 2 commits March 18, 2025 08:12

update comments

56e0b5d

Signed-off-by: Chen Zhang <[email protected]>

update the explaination of kv cache groups

19b2589

Signed-off-by: Chen Zhang <[email protected]>

comaniac approved these changes Mar 19, 2025

View reviewed changes

zhuohan123 approved these changes Mar 20, 2025

View reviewed changes

comaniac added ready ONLY add when PR is ready to merge/full CI is needed force-merge labels Mar 20, 2025

heheda12345 added 2 commits March 20, 2025 19:30

fix tests

757f350

Signed-off-by: Chen Zhang <[email protected]>

Merge remote-tracking branch 'origin/main' into virtual_layer

33860dd

mergify bot added the needs-rebase label Mar 21, 2025

Merge branch 'main' of github.com:vllm-project/vllm into virtual_layer

abb64f0

Signed-off-by: Chen Zhang <[email protected]>

mergify bot removed the needs-rebase label Mar 21, 2025

vllm-bot merged commit 93a00d7 into vllm-project:main Mar 21, 2025
30 of 32 checks passed

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

63d8ac4

Signed-off-by: Chen Zhang <[email protected]>

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

f2bb04a

Signed-off-by: Chen Zhang <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

d0ad8d0

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

bf29eba

Signed-off-by: Chen Zhang <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

cf76c81

Signed-off-by: Chen Zhang <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[v1] Refactor KVCacheConfig (vllm-project#14079)

8cb58b6

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

		raise NotImplementedError


		def make_kv_cache_configs_consistent(kv_cache_configs: list[KVCacheConfig]):

	num_blocks = min(kv_cache_config.num_blocks
	min_num_blocks = min(kv_cache_config.num_blocks

Uh oh!

[v1] Refactor KVCacheConfig #14079

[v1] Refactor KVCacheConfig #14079

Uh oh!

Conversation

heheda12345 commented Mar 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 1, 2025

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhuohan123 Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 9, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

heheda12345 Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comaniac Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

comaniac Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

comaniac Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 12, 2025

Uh oh!

heheda12345 commented Mar 18, 2025

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

mergify bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

heheda12345 commented Mar 1, 2025 •

edited by github-actions bot

Loading